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ABSTRACT 

The work pursued under this grant dealt with artificial neural networks and other dis¬ 
crete/continuous models. New bounds were obtained for sample complexity for-identification 
of static and dynamic concept classes defined by static and recurrent networks. Structural and 
system-theoretic properties were characterized, leading to effective tests for identifiability and 
other properties. Related models of hybrid systems were also studied; an equivalence prob¬ 
lem for PL systems was shown to be decidable in polynomial time, and a general Maximum 
Principle was established for hybrid systems. 
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1 Introduction 


The work pursued under this grant was centered on artificial neural networks and other dis¬ 
crete/continuous models for computation and systems. 

For neural networks, we focused on foundational theoretical results, in the light of which 
algorithms used in applications (such as adaptive control, pattern recognition, of fault detection) 
can be compared and evaluated. For instance, our work on Vapnik-Chervonenkis dimension 
allows a precise quantification of the amount of data needed in order to realiably generalize 
from samples, in a learning or adaptive control application, and our work on identifiability 
permits an understanding of multiple minima in cost functions associated to numerical fitting. 
We also continued our work on system anad control theoretic questions associated to systems 
obtained by combining “saturation” sigmoidal devices which interconnect to other such devices 
via excitatory and inhibitory links. Towards the latter part of the grant period, we turned our 
attention to “spiking” neuronal models and their signal processing capabilities, as well as to 
the limitations imposed by noise on the computational capabilities of networks. 

We also studied other hybrid models of systems and computation as part of this project. 
This area, broadly speaking, deals toth the interface between continuous and discrete devices 
(such as digital computers) used in symbolic processing. In this context, we continued the 
development of tools for piecewise-linear analysis, and we obtained a far-reaching generalization 
of the Maximum Principle of optimal control which applies to “hybrid” dynamics. 

In this report, we present some of the accomplishments of the project, selected to highlight 
the variety of projects pursued. Complete details on this and other work done under this grant 
can be found in the following Web pages: 

http://www.math.rutgers.edu/~sontag 


http://www.math.rutgers.edu/ siissmann 


2 Recurrent Neural Networks 

It is said that saturation is the most commonly encountered nonlinearity in control engineering, 
so the development of techniques for the modeling and control of such systems is obviously of 
great interest. Saturations might occur in controls (discussed later) or in the rates of change of 
state variables. For linear systems, one is then led to the study of what are sometimes called 
recurrent neural networks, i.e. systems of the form 

x = a(Ax + Bu) 

or their corresponding discrete-time versions, where A and B are as usual in linear systems 
theory, combined with an output map y = Cx, typically a coordinate projection. (If we had 
a = the identity function, we would be studying continuous-time time-invariant linear systems, 
but typically a is a bounded map whose translates and dilations - just as with wavelet generators 
- provide dense sets in appropriate function spaces, for instance tanh.) A different motivation 
for the study of these systems is that they arise as a very stylized model of dynamically evolving 
biological networks (one interprets the vector equations for x as representing the evolution of 
an ensemble of n “neurons,” where each coordinate Xi of x is a real-valued variable which 
represents the internal state of the ith neuron, and each coordinate Ui,i = l,...,m of u is 
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an external input signal; the coefficients Aij, Bij denote the weights, intensities, or “synaptic 
strengths,” of the various connections, and the coordinates of y(t) = Cx{t) represent the output 
of p probes, or measurement devices, each of which averages the activation values of several 
neurons). Sometimes one considers small variants of the model shown above, including for 
instance a linear term outside of the saturation, used to insure stability, as in the well-know 
model proposed by Hopfield for associative memory storage and retrieval. 

Among many non-control applications, recurrent nets have been employed in the design of 
control laws for robotic manipulators, speech recognition, speaker identification, formal lan¬ 
guage inference, and sequence extrapolation for time series prediction. In control recurrent 
nets have been proposed as generic identification models or as prototype dynamic controllers, 
though other architectures are also used. In addition, theoretical results about neural networks 
established their universality as models for systems approximation as well as analog computing 
devices (we reported on our work along these lines in a previous grant period; see an article 
describing this work in Science, April 28, 1995). Special purpose chips have been built to imple¬ 
ment recurrent nets directly in hardware; for instance, Hitachi’s Wafer Scale Integration chips 
implement Hopfield nets with over 500 neurons and 30,000 synaptic connections. Electrical 
circuit implementations of recurrent nets, employing resistively connected networks of nonlin¬ 
ear amplifiers, with the resistor characteristics used to reflect the desired weights, have been 
suggested as analog computers, in particular for solving constrained optimization problems and 
for implementing content-addressable memories. 

The PI started a few years ago a program of study directed to questions of controllability, 
stabilization, and (when outputs y = Cx are considered) observability and parameter esti¬ 
mation for such systems. Surprisingly, explicit necessary and sufficient tests are available for 
observability and parameter identification, the nonlinear character (and universality) properties 
of the class of systems notwithstanding. We cannot provide a reasonable discussion within the 
constraints of this proposal, so the survey paper [17] (available in preprint form from the Pi’s 
web site) should be consulted for a recent exposition of our work in the . above areas; we will 
limit ourselves here to describing just one area. 

The paper [4] showed that the system x = a (Ax + Bu) is completely controllable provided 
that ( 1 ) a belongs to a certain class of nonlinearities characterized among other properties by 
exponential asymptotics (this class includes tanh, the typical saturation nonlinearity studied in 
neural networks, but it excludes arctan, which qualitatively shares boundedness, monotonicity, 
and concavity properties with tanh), and ( 2 ) that B £ B n<m , the n x m matrices whose rows 
are nonzero and distinct up to signs. An exposition can be also found in E. Sontag’s textbook 
Mathematical Control Theory (second edition, Springer-Verlag, 1998). This left open a large 
number of questions, all of which are of great interest, foremost among them: what can be 
said if the hypothesis that B £ B n>m is dropped? In general, obtaining necessary and sufficient 
conditions for controllability when B g B n , m appears to be a very difficult subject. In [9], we 
showed that B £ B n , m is necessary for a stronger form of complete controllability (local-local), 
but it is easy to see that this condition is not necessary for plain controllability. We did produce 
in that paper a complete solution for two-dimensional single-input (n = 2 , m = 1 ) systems; 
let us summarize those results. When B = ( 61 , 62 )' £ B n ,i, we may assume after a rescaling 
of inputs, changes of variables x —* — x or y —* -y, and/or exchanges of variables, that one of 
these cases holds: B = (0,0)', B = (0,1)', or B = (1,1)', and in the first case we don’t have 
controllability. In the remaining two cases, under a further feedback transformation of the type 
u —> ax + by + u', where v! is a new control, one may transform a recurrent net, while preserving 
controllability properties, into one of the two canonical forms: x = a(ax + by),y = <r(tt), which 
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are shown to be controllable if and only if |a| < \b\ and b ± 0, or x = a (ax + u), y = a(by + u), 
which are controllable if and only if a ^ b. Obtaining a condition without dimension constraints 
is the final goal, but even characterizing the three-dimensional case seems nontrivial. An easier 
question might be to determine whether the set of pairs (A, B) which result in controllable 
systems (for a = tanh, let us say) is a semialgebraic set, as in the two-dimensional case. In 
addition, the case when a does not satisfy the axioms in [4] represents an even more challenging 
task. These questions remain open for further research. 


3 Systems with Input Saturations 

One of the interesting, and somewhat unexpected, places in which neural network (feedforward 
sigmoidal) models have appeared is in the design of global stabilizing feedback for systems with 
input constraints. Often control systems are designed based on linear systems theory, which 
ignores amplitude limitations on inputs. However, energy, mechanical, or safety requirements 
often impose limits on control authority, which may result in instabilities or in undesired in¬ 
variant sets (limit cycles, parasitic equilibria, etc). The classical approach to dealing with this 
problem has been to attempt to prevent saturation, forcing the regulated system to say within 
a region of linear behavior. Most “anti-windup” methods fall in this category. On the other 
hand, it is possible to approach the problem differently, and view the object to be controlled 
as a nonlinear system of the type x = Ax + Ba(u) where A and B are as usual in linear 
control theory and cr is a saturation such as tanh or the standard clipping saturation, applied 
coordinatewise. One line of work by the Pis deals with the study of such systems. 

Fuller showed in the early 1970s that it is in general impossible to globally stabilize the 
origin of these systems by means of linear feedback u=Fx even if the system > is open-loop 
globally controllable to the origin. This suggests the obvious question of searching for nonlinear 
feedback laws u=k(x) that achieve such stabilization, and in particular for nicely behaved 
and easily implementable controllers (in contrast to optimal control techniques, which result 
in highly irregular feedback). In a well-known 1990 paper, the Pis we proved that smooth 
stabilization is always possible. Motivated by our paper, soon thereafter Teel made the ground¬ 
breaking discovery that single-input multiple integrators can be stabilized by feedbacks which 
are themselves compositions of linear functions and iterated saturations (“nested saturation” 
technique). This, in turn, made us redirect our efforts to the use of Teel’s technique as well as 
a variant (parallel saturations, which is a “neural network” architecture) in the general case of 
open-loop asymptotically controllable linear systems with no exponential instabilities (the rank 
of [A/ — A, B] is n for all A in the imaginary axis, and A has no eigenvalues with positive real 
part), obtaining general results, which appeared in various improved versions in the periods 
covered by the previous grant. In this grant period, we continued this study, producing a 
discrete-time version as well ([2]). 


4 Learning Theory and Identification 

The study of neural nets, and in particular of their “learning” (adaptive control, identification) 
capabilities, motivated us to initiate a program of research in computational learning theory, 
an active area of theoretical computer science. In particular, we have focused on the estimation 
of learning-theoretic (VC, Pollard) dimensions which are used as measures of interpolation and 
extrapolation (“generalization”) and pattern classification power; the many publications in our 
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web site can be consulted for details on many projects. Our contributions in this area have 
been recognized by that community; for instance we have given two plenaries at NIPS, the pre¬ 
eminent and highly selective conference in the area, and were asked to deliver a short course 
on neural network learning at a Newton Institute summer program (lectures described in [18]). 

One recent direction of study has been the generalization of dimension estimates for linear 
systems obtained in a previous grant period ( IEEE Trans. Inform. Theory 42 (1996): 1479- 
1487) to discrete ([7]) and continuous ([8]) time nonlinear systems, and especially the study of 
dimension estimates, and their implications for sample complexity of worst case identification 
for linear systems subject to bandwidth-restricted inputs ([19], [15]). That work takes a com¬ 
putational learning theory approach to a problem of linear systems identification. It is assumed 
there that input signals have only a finite number k of frequency components, and systems to 
be identified have dimension no greater than n. The main result established that the sample 
complexity needed for identification scales polynomially with n and logarithmically with k. Let 
us provide some details of this particular work. 

4.1 Learning and Linear Systems Identification 

The problem of systems identification may be seen as an instance of the general question of 
“learning” an unknown function. Techniques from Computational Learning Theory (CLT) can 
be applied and our previous papers (previous grant period) had already provided results appli¬ 
cable to the identification of discrete-time linear systems on finite-window data. For continuous¬ 
time systems, the situation is complicated by the fact that, even for finite-length inputs, learn- 
ability is impossible when formulated in the CLT framework, as can be seen by applying the 
discrete-time results (through sampling). Thus, in our work, we supposed that all inputs to be 
used, in the learning as well as in validation stages, belong to the linear span of a fixed number 
k of sinusoidal basic functions. This band-limiting assumption allowed us to obtain a precise 
result: the sample complexity needed for identification scales polynomially on an upper bound 
on the systems being identified, and logarithmically with k. This provides a tight analogy to 
the discrete results previously obtained, in which k appeared as the length of the discrete-time 
window employed. 

In the context of learning we discuss continuous-time linear control systems: 

x = Ax + Bu , x(0) = x°, y = Ca;, (1) 

where A, B , and C are n x n, n x m, and pxn real matrices, and the time interval is [0,1]. We 
study sign-observations 

sign j/(l) = (sign yi(l),... ,sign y p ( 1)) T , 

where sign z = 0, if z < 0, sign 2 = 1, if 2 > 0 and T stands for the transpose. For scalar 
observations this is a classification problem; each output is classified either 0 or 1 and the 
VC-dimension can be used to study the learning complexity of the problem. (When p > 1, a 
generalization of the VC-dimension or a loss function is needed.) 

We consider controls u = (u\,... ,u m ) such that 


u = Gu, 

where G is a m x k matrix that parametrizes the control. The set of basis input functions 
Q = (cai,... ,ujk} is fixed. The bounds for the VC-dimension or other complexity dimensions 
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will depend on the properties of the set Cl. For scalar inputs (i.e., m = 1 ) the VC-dimension 
associated to the mapping from inputs G to scalar sign-observations is bounded by k, which in 
fact can be very large in applications. This bound is tight; we give an example of a function class 
Cl for which the associated VC-dimension is indeed k. By considering band-limited controls the 
bound can be improved. In this work we consider the following set of basis input functions 

Cl = ..., Uk ; ui,... linearly independent and 

u)j = t e ie 01 ^ sin(/3jt) or Uj = t^e a ^ cos(fijt) 
with ij € N, atj, (3j G R, j = 1,..., fcj, 

and let 

Anax = max{ 4 ,..., 4 }. ( 2 ) 

Order the set of basis input functions Cl and denote w = (wi ,... ,u>k) T ■ Let 

Xp = {Gw : [0,1] —>■ R m ; G G R mfc }, 

and for each linear system E = (A, B, C, x °) of dimension n define the mapping : Xp —> R p 
by 4>s(Gw) = 3 /( 1 ), where y( 1 ) is the solution of E with control u = Gw. Similarly we define 
the mapping for sign-observations, 

5s : Xq —* { 0,1 } p Gu> (-4 sign ($s(Cw)). 

The class of above mappings is the sign system concept class 

C m ,p = {5s ; E ■ linear system of dimension n). 

Theorem [Sample complexity for concept learning]. For sign systems concept:class’C m ,i with 
scalar observations, i.e., p— I, the sample complexity s(c, 5) for identifiers that agree with the 
observed sample can be bounded as 

, [8 VC (C m i) {8e\ 4, f 2\) 

s(e, 6) < max |- j —— log 2 ( — ) , - log 2 U ) |, 

where 


VC (C TO) i) < 2 ( 2 mn 2 + 4n + 1 ) log 2 j^ 8 e( 8 mn 2 A:(n + £ max ) + 1 )( 2 nk + 2(1 + 2 k) n ) 
and 4 iax is given by ( 2 ). 

In terms of n (the dimension of the state space) and k (the band-width) the upper bound 
for the VC-dimension is of the form 0 (n 3 log 2 (n/c)). We provided also VC-dimension lower 
bound, which is, in terms of the band-width, of the form 0(log(/c)). In particular, in a typical 
setting of fairly small system dimension n and large band-width k, the log k bound is a clear 
improvement over the linear bound given by elementary analysis. 

In our work, we illustrated how the system (1) with a:(0) = 0 can be parametrized by 
n(m + 1) parameters. In the following definition we take the final time to be r > 1 in order to 
show the effect of the time interval in the learning complexity: 

Let A € R n ( OT+1 ) be the system parameters as above with |jAH^ = max 1 < i < n ( m+1 ) A, < 1 
and let F(X,u) = y{r ) be the solution of ( 1 ) with system parameters A and control u = 
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(ui,... , u m ) € U = {« = (« 1 ,..., u m ) ; / 0 T < M, i = 1,..., m}. The c/ass unfh bounded 

controls is defined as 

f B = {F(A,-):C/->R; HA^ < 1}. 

Theorem [Sample complexity for proper agnostic learning]. Let k > 0, then the class Tb is 
properly agnostically learnable from 

O ( fat ( 1 / 4 - K )e(^B) log 2 ^ + log ^ 


samples, where 


fat (i/ 4 —ft)e ^ min 


(m+l)nlog 2 [“ ; ( " ffly J, 

2(m + 4)nlog 2 (8e(nmM(n + ^ ma x) + l)(2nfc + 2(2A; + l) n )), 


together with £ max given by (2) and (2), and M a constant satisfying 



r — i)|df < kM 


for alii = 1,..., m. In above, [xj stands for the integer part of x. 

Let us discuss briefly the techniques used. When the basis input functions u>i ,..., w*, satisfy 
certain rationality condition associated to the control system (we split the rational function into 
pieces without poles) we show that the sign of the final state can by computed by a Boolean 
formula evaluating polynomial equalities and inequalities. Then the complexity bound can 
be obtained by counting arguments and using a result by Goldberg and Jerrum (1995). We 
prove lower bounds for the VC-dimension with scalar sign-observations. In comparison to the 
upper bound, the lower bound is more general; we just need to assume that the basis input 
functions are continuous and independent. The bound is proved by using dual VG-dimensicn 
and axis shattering introduced in our previous work. The bounds on the fat-shattering dimen¬ 
sion associated with proper agnostic learning are obtained with a very simple technique. The 
paper contains also pseudo-dimension bounds with respect to loss functions that preserve the 
rationality structure of the output. 


5 Piecewise-Linear (“Hybrid”) Systems 

Artificial neural networks are sometimes proposed as a framework in which to integrate symbolic 
and numeric computation (a point of view emphasized in and an alternative source of models 
for nonlinear control and identification. A different but parallel avenue to some of the same 
conceptual issues is provided by the area now known as “hybrid systems” theory. Hybrid 
systems theory has recently become the focus of increased research, as evidenced for instance 
by the many conferences and workshops in the area. The PI is recognized as having originated 
one of the first approaches to hybrid systems analysis, the theory of discrete-time piecewise 
linear systems (PLS) introduced in the early 1980s (IEEE Trans. Autom. Control 26(1981): 
346-358.) Recently, several teams have initiated other research efforts on PLS. For instance, 
Morari and his group at the ETH showed recently that the general class of hybrid “Mixed Logical 
Dynamical (MLD)” systems is in a precise sense equivalent to that of PLS as introduced by 
the PI, and based on this equivalence, and using tools from piecewise affine systems, studied 
basic system-theoretic properties and suggested numerical tests based on mixed-integer linear 
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programming for checking controllability and observability. Recently, we were able to prove the 
polynomial-time solvability of the state equivalence problem, which was a long-standing open 
question, see [11]. 

Among the most basic questions which can be asked about any class of systems are those 
regarding equivalence, such as: given two systems, do they represent the same dynamics under 
a change of variables? As a preliminary step in answering such a question, one must determine 
if the state spaces of both systems are isomorphic in an appropriate sense. That is, one needs 
to know if an invertible change of variables is at all possible. Only later can one ask if the 
equations are the same. For classical, finite dimensional linear systems, this question is trivial, 
since only dimensions must match. For finite automata, similarly, the question is also trivial, 
because the cardinality of the state set is the only property that determines the existence of 
a relabeling of variables. For other classes of systems, however, the question is not as trivial, 
and single numbers such as dimensions or cardinalities may not suffice to settle the equivalence 
problem in the respective category. Given that the class of behaviors that can be represented 
by PLS is extremely large, it should come as no surprise that many of the basic verification and 
design objectives are NP-hard or even undecidable, as we have remarked in various publications. 
In our orignal work ( Pacific J.Math., 98(1982): 183-201), we provided a characterization of the 
Grothendieck group of the category, as well as a generalization of the Euler characteristic for 
polyhedra (and certain theorems for Euler characteristics become trivial when interpreted in 
these terms). Moreover, we proved existence of an algorithm for deciding if two PL sets (given 
in terms of formulas in L ) are isomorphic, via results on decidability of word problems and 
results of Eilenberg and Schutzenberger on finitely generated commutative monoids. Thus the 
isomorphism problem is one problem that is decidable. However, the algorithm that results 
from thaf approach has exponential time complexity. Obviously, having a polynomial time 
algorithm should have a m^ijor impact on future studies of PL systems. >'■. . 

'5.1 ’ Sortie more details •• "■ '* \ •*. . 

In order to sketch the basic definitions for PL algebra and PL systems, it is convenient to 
introduce the first order theory of the real numbers with addition and order. That is, we take 
the first-order language L consisting of constants r and unary functions symbols r(-), for each 
real number r (the latter corresponding to “multiplication by the constant r”), as well as binary 
function symbol + and relation symbols > and =. A basic fact is that a quantifier elimination 
theorem holds: every set defined by a formula in L is a PL set. That is to say, for any formula 
$(cc) with n free variables x = aq,..., x n , the set {a: | $(x)} is a PL set. (Of course, we can 
enlarge the language by adding symbols for sets and maps already known to be PL.) This fact 
is very simple to establish and it provides a very convenient tool for establishing the basic 
theoretical properties of PL systems. Moreover, the proofs of these facts are constructive, in 
that the actual quantifier algorithm could be in principle used to compute feedback laws and the 
like. Another constructively-proved fact from our 1982 paper is the following “global implicit 
function theorem”: Assume that </> : X x Y -* M" is a PL map, and assume that for each x 
the equation <f>(x,y) = 0 can be solved for y. Then there is a PL map n : X Y so that 
H x ) k(x)) = 0 for all x. (Equivalently: for any PL subset RCXxY with onto projection into 
X , there is a PL map n : X -*■ Y (a “section”) so that (x,<f>(x)) € R for all x € X.) This fact 
is central to the existence of feedback controllers. 

A PL isomorphism is nothing else than an operation of the following type: make a finite 
number of cuts along a set of lines (or segments), apply an affine (linear plus translation) 
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transformation to each piece (not dropping any lower-dimensional pieces), and finally paste 
it all together. As an example, let us take the interior of the triangle in R 2 obtained as 
oc{( 0 , 0 ), ( 1 , 1 ), ( 2 , 0 )}, where we are using “oc” to indicate the interior of the convex hull of 
the corresponding points. (We can also define this set, of course, as the intersection of the 
three hyperplanes x 2 > 0, x\ - x 2 > 0, and x\ + x 2 < 2.) We now show that this triangle is 
PL isomorphic to the interior of the open square with vertices (0,0), (1,1), (0,1), and (1,0). 
First we cut along the segment Si = oc {( 1 , 0 ), ( 1 ,1)}, obtaining the union of Si, S 2 , and S 3 , 
where S 2 = oc{( 0 , 0 ), ( 1 , 0 ), ( 1 , 1 )} and S 3 = oc{(l, 1 ), (1,0), ( 2 , 0 )}. Next, we apply the affine 
transformation 



to change S 3 into S 3 = oc {(1,1), (0,0), (0,1)}. Finally, we apply the affine transformation 



to change Si into the missing diagonal S[ = oc {( 0 , 0 ), ( 1 ,1)}, and we glue it all back. See 
Figure 1. 


Si S[ 



Figure 1: Example: triangle is PL isomorphic to square 


One of the main results in our early 1980s work on piecewise-linear algebra provided a 
classification of PL sets under isomorphism. The critical step in this classification is to associate 
to each PL set X a “label” with the property that two spaces X and Y are isomorphic if and 
only if their labels are related in a certain manner. (By analogy, two finite-dimensional real 
vector spaces are linearly isomorphic if and only if their dimensions are the same, i.e., letting 
the “label” be the dimension, if their labels coincide. But in the PL case, single integers 
do not suffice as “labels”.) Labels are, by definition, polynomials in two variables x,y with 
non-negative integer coefficients. We let N[x, y] denote the collection of all such polynomials. 
Examples of labels are 1, x, y, x 3 , 1 + xy + x 2 , etc. We interpret the sum in N[x, y] as union of 
disjoint sets and the product as Cartesian product of sets, the unit 1 as a one-element set, the 
variable x as the open interval (0,1), and the variable y as the half-line (0, +oo). Thus, x 3 is 
an open cube, and 1 + xy + x 2 is the union of a point, a disjoint set ( 0 , 1 ) x ( 0 , +oo), and a unit 
square disjoint from both. One may decompose any PL set into a finite union (algebraically, 
a sum) of objects each of which is linearly isomorphic to a monomial in x and y. (Simplicial 
decompositions provide a way to do this.) In this manner, a label (nonunique) can be associated 
to each PL set. 

Certain formal equalities are easy to establish. Splitting the interval x as 

(0,1) = (0,1/2) (J {1/2} [j {1/2,1), 
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and then using affine maps ( t t-> 2 t and t —► 2 t- 1 respectively) to map the first and last interval 
to x, we obtain “x = 2 x+ 1”. On the other hand, the split y = (0, +oo) = (0,1) |J{1} U(l> +oo) 
(and t t — 1 applied to the last set) gives us the identity “y = x + 1 + y”. Drawing a 
bisecting line through the first quadrant in R 2 gives “y 2 = y 2 + y + y 2 ” (using, e.g., the linear 
transformation (ti,t 2 ) | - + (h — t 2 ,t 2 ) to send the lower triangle {(ti,* 2 )|*i > 0,ii > t 2 } to y 2 ). 

It was shown in our previous work that these three identities are enough, in the sense that 
two sets are isomorphic if and only if their labels can be obtained from each other by using 
repeatedly these elementary identities. In other words, isomorphism is precisely determined by 
the congruence generated by these identities in the semiring N[rr, y]. In this manner, one may ap¬ 
ply to the equivalence problem the results of Eilenberg and Schiitzenberger on finitely generated 
commutative monoids that are obtainedd by quotients under such congruences. Equivalence 
under congruences is in general non-polynomial time; however, exploiting the special form of 
the congruences that define PL equivakence, we were able in [11] to find a polynomial time 
algorithm for our problem. Our collaborator on this project, B. Dasgupta, has recently super¬ 
vised a Master’s thesis implementing the algorithm. As mentioned earlier, this is only a first 
step in studying equivalence of PL systems, and further work is ongoing. 


6 Networks of Spiking Neurons 

We have also continued work on a different type of network which represents neural populations, 
based on “spiking neurons” (information is encoded in inter-spike intervals). This biologically 
more realistic and appealing class of systems gives rise to a whole new set of questions. Some 
of our results regarding such models are outlined next. (Let us just add here that we have 
also made initial progress towards the characterization of the structure of local minima of. 
associated fitting problems, using a combination of differential topology (Morse theoretic), logic,, 
and algebraic-geometric techniques such as previously.employed in our work on ! critical points 
of objective functions involving sigmoidal networks (Advances in Computational Mathematics, 
5(1996): 245-268) and the geometry of Banach space techniques from our paper [ 1 ], in the 
count of minima and the study of approximation rates.) 

Experimental data show that biological synapses behave quite differently from the symbolic 
synapses in all common artificial neural network models. Biological synapses are dynamic, 
i.e., their “weight” changes on a short time scale by several hundred percent in dependence of 
the past input to the synapse. In [12], we addressed the question how this inherent synaptic 
dynamics - which should not be confused with long term “learning” - affects the computational 
power of a neural network. In particular we analyzed computations on temporal and spatio- 
temporal patterns, and we gave a complete mathematical characterization of all filters that 
can be approximated by feedforward neural networks with dynamic synapses. It turns out 
that even with just a single hidden layer such networks can approximate a very rich class of 
nonlinear filters: all filters that can be characterized by Volterra series. This result is robust 
with regard to various changes in the model for synaptic dynamics. Our characterization result 
provideed for all nonlinear filters that are approximable by Volterra series a new complexity 
hierarchy which is related to the cost of implementing such filters in neural systems. This set 
of results has given rise to several follow-up papers, and has attracted considerable attention in 
the theoretical neuroscience community. Let us give some details next. 

Synapses in common artificial neural network models are static: the value Wi of a synaptic 
weight is assumed to change only during “learning”. In contrast to that, the “weight” Wi(t) of 


9 


a biological synapse at time t is known to be strongly dependent on the inputs Xi(t - r) that 
this synapse has received from the presynaptic neuron i at previous time steps t — r. Several 
recent papers have shown that a model of the form 

Wi(t) = Wi ■ D(t ) • (1 + F(t)) (3) 

with a constant Wi, a depression term D(t) with values in (0,1], and a facilitation term F(t) > 0, 
can be fitted remarkably well to experimental data for synaptic dynamics. The facilitation term 
F(t) is usually modeled as a linear filter with exponential decay: If x r {t — r) is the output of 
the presynaptic neuron (typically modeled by a sum of 5- functions), then the current value of 
this facilitation term is of the form 

Fit ) = p [ Xi(t — r) • e~ T / 7 dr (4) 

Jo 

for certain parameters p, 7 > 0 that vary from synapse to synapse. The analysis in our work is 
primarily based on this model, but we also showed that our results also hold for the somewhat 
more complex models for synaptic dynamics obtained in a mean-field context. 

We showed in [12] that such inherent synaptic dynamics empowers neural networks with 
a remarkable capability for carrying out computations on temporal patterns (i.e., time series) 
and spatio-temporal patterns. This computational mode, where inputs and outputs consist of 
temporal patterns or spatio-temporal patterns - rather than static vectors of numbers - appears 
to provide a more adequate framework for analyzing computations in biological neural systems. 
Furthermore their capability for processing temporal and spatio-temporal patterns in a very 
efficient manner may be linked to their superior capabilities for real-time processing of sensory 
input, hence our analysis may provide hew ideas for designing artificial neural systems with 
similar capabilities. 

We considered not just computations of neural systems with a single temporal pattern as 
input, but also characterize their computational power for the case where several different tem¬ 
poral patterns u\(t),... ,u n (t) are presented in parallel as input to the neural system. Hence 
we also provided a complete characterization of the computational power of feedforward neu¬ 
ral systems for the case where salient information is encoded in temporal correlations of firing 
activity in different pools of neurons (represented by correlations among the corresponding con¬ 
tinuous functions ,u n (t ) ). Therefore various informal suggestions for computational 

uses of such code can be placed on a rigorous mathematical foundation: It is easy to see that a 
large variety of computational operations that respond in a particular manner to correlations 
in temporal input patterns define time invariant filters with fading memory, hence they can 
in principle be implemented on each of the various kinds of dynamic networks considered in 
our work. Previous standard models for computations on temporal patterns in artificial neural 
networks are time-delay neural networks (where temporal structure is transformed into spa¬ 
tial structure) and recurrent neural networks, both being based on standard “static” synapses. 
Such transformation makes it impossible to let “time represent itself” (in the language of Mead) 
in subsequent computations, which tends to result in a loss of computational efficiency. The 
results of our work suggest that feedforward neural networks with simple dynamic synapses 
provide an attractive alternative. 

6.1 More Details 

In contrast to the static output of gates in feedforward artificial neural networks, the output 
of biological neurons consists of action potentials (“spikes”), i.e., stereotyped events that mark 
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certain points in time. These spikes are transmitted by synapses to other neurons, where 
they cause changes in the membrane potential that affect the times when these other neurons 
fire and thereby emit a spike. Empirical data describes the amplitudes of EPSC’s (excitatory 
postsynaptic currents) in a neuron in response to a spike train from a presynaptic neuron. 
These two neurons are likely to be connected by multiple synapses, and the resulting EPSC 
amplitude can be understood as a population response of these multiple synapses. Therefore 
it is justified to employ a deterministic model for synaptic dynamics in spite of the stochastic 
nature of synaptic transmission at a single release sit. The EPSC amplitude in response to a 
spike is modeled by terms of the form w • (l+ T) and w ■ V - (1 + P), where T is a linear filter 
with impulse response p • e~ r ^ modeling facilitation and D is some nonlinear filter modeling 
depression at synapses. In some versions of the model considered in the literature, this filter 
D consists of several depression terms. However it only assumes values > 0 and is always time 
invariant and has fading memory. 

We analyzed the impact of this synaptic dynamics in the context of common models for 
computations in populations of neurons where one can ignore the stochastic aspects of computa¬ 
tion in individual neurons in favor of the deterministic response of pools of neurons that receive 
similar input (“population coding” or “space rate coding”). More precisely, we based our neural 
network model is based on a mean-field analysis of networks of biological neurons, where pools 
P of neurons serve as computational units, whose time-varying firing activity (measured as the 
number of neurons in P that fire during a short time interval [t,t + A]) is represented by a 
continuous bounded function y(t). In case that pool P receives inputs from m other pools of 
neurons Pi,... ,P m , we assume that y(t ) = v{YT=\ Wi{t)x l {t) + w 0 ), where x t (t) represents the 
time-varying firing activity in pool P* and Wi(t) represents the time-varying average “weight” 
of the synapses from neurons in pool P, to neurons in pool P. (The function a : R —► M is 
some “activation function”/ for example n(x) = 1/(1 + e~ x ); for the theorems, it suffices to 
assume that cr is continuous and not a polynomial.) We allow a general representation of the 
dynamics of synapses from a nonlinear filter applied to a sequence of ( 5 -functions (i.e., to a spike 
train) to be a nonlinear filter applied to a" continuous input function x,(t). Thus, if x l (t) is a 
continuous function describing the firing activity in the ith presynaptic pool Pi of neurons we 
model the size of the resulting synaptic input to a subsequent pool P of neurons by terms of 
the form Wi(t) ■ Xi(t ) with Wi(t) := Wi ■ (1 + Fxi(t)) or Wi(t) := Wi • T>Xi(t ) • (1 + Pxi{t)), where 
the filters T and D are defined as in previous literature. The first equation that just models 
facilitation gives rise to the definition of the class DN of dynamic networks, and the second 
equation, that models the more common co-occurrence of facilitation and depression, gives rise 
to the definition of the class DN*. 

We define the class DN of dynamic networks as the class of arbitrary feedforward networks 
consisting of sigmoidal gates that map input functions x\(t),... ,x m (t) to a function 

m 

y(t) = + Wo), 

i =1 


with 

r oo 

Wi(t) = Wi • (1 + p / Xi(t — r)e~ T ^dT) 

Jo 

for parameters Wi EM. and p, 7 > 0 .0 is some “activation function” from R into R , for example 
the logistic sigmoid function defined by cr(x) = 1/(1 + e~ x ). We will assume in the following 
only that cr is continuous and not a polynomial. The slightly different class DN* is defined in 
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the same way, except that Wi(t) is of the form 

Wi(t) = Wi ■ T>Xi(t) • (1 + p [ Xi(t - r)e~ T/ ' Y rfr), 

Jo 

where D is some arbitrary given time invariant fading memory filter with values T>Xi(t) G 
(0,1]. Thus dynamic networks in DN or DN* are simply feedforward neural networks consisting 
of sigmoidal neurons, where static weights Wi are replaced by biologically realistic history- 
dependent functions Wi(t). The input to a dynamic network consists of an arbitrary vector of 
functions iti(-),..., u n ( ). The output of a dynamic network is defined as weighted sum 

k 

z(t) = + a o 

i =l 

of the time-varying outputs yi{t ),..., yk(t) of certain sigmoidal neurons in the network, where 
the “weights” a 0 ,... can be assumed to be static. Thus a dynamic network with n inputs 
maps n input functions ui(-),..., u n {-) onto some output function z(-). 

Networks that operate on temporal patterns map functions of timae onto functions of time. 
Let us call these operators filters. We will reserve the letters T, H, S for filters, and we write Tu 
for the function resulting from an application of the filter T to a vector u of functions. Notice 
that when we write Tu(t) we mean, of course, {Tu)(t) (that is, the function Tu evaluated at 
time t). We write C(A, B ) for the class of all continuous functions f : A—* B. We will consider 
suitable subclasses U C C(A, B) for A C R fc and BCR, and study filters that map U n into 
R r (where R® is the class of all functions from R into R), i.e. filters that map n functions 
u(-),..., u n (-) onto another function z(-). Let us focus for simplicity on the case k — 1, i.e. the 
case where the input functions iti(-), • • •, u n (-) are functions of a single variable - which we will; 
interpret as time. The case k > 1 (spacio-temporal patters) was also studied in our work. 

A trivial special case of a filter is the shifting filter S io with St 0 u(t) = u(t — to). An arbitrary 
filter T : U n —> R® is called time invariant if a shift of the input functions by a constant to 
just causes a shift of the output function by the same constant to, i.e., if for any to € R and 
any u —< u \,..., u n ) € U n one has that Tu to (t ) = Tu(t - to) where u to =< S to ui ,..., S to u n ). 
All filters considered in our work are time invariant. Note that if U is closed under <S to for all 
t 0 eR then a time invariant filter T : U n -> R® is fully characterized by the values Tu( 0) for 
ueU n . 

Another essential property of filters considered in our work was “fading memory” in the 
sense of Boyd and Chua. If a filter T has fading memory then the value of Tv( 0) can be 
approximated arbitrarily closely by the value of ^u(O) for functions u that approximate the 
functions y for sufficiently long bounded intervals [—T,0]. The formal definition is as follows: 
a filter T : U n —> R r has fading memory if for every y =< V\,..., v n ) G U n and every e > 0 
there exist 6 > 0 and T > 0 so that |^ r u(0) — < e for all u =< tti,... ,u n ) G U n with 

the property that ||u(t) — u(t) || < <5 for all t G [—T,0]. 

It is obvious that any filter T which can be represented by a sum of finitely many Volterra 
terms of any order (i.e., by a Volterra polynomial or finite Volterra series) is time invariant 
and has fading memory. This holds for any class U of uniformly bounded input functions u. 
Both of these properties are inherited by filters T that can be approximated by some arbitrary 
infinite sequence of such filters. This implies that any filter that can be approximated by finite 
or infinite Volterra series (which converge in the sense used here) is time invariant and has 
fading memory (over any class U of uniformly bounded functions it). Boyd and Chua showed 
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in 1985 that under reasonable additional assumptions about U the converse also holds: any 
time invariant filter T : U —> R® with fading memory can be approximated arbitrarily closely 
by Volterra polynomials. 

One of our theorems shows that simple filters that only model synaptic facilitation (as 
considered in the definition of DN) provide the networks already with sufficient dynamics to 
approximate arbitrary given time invariant filters with fading memory. We show that the si¬ 
multaneous occurrence of depression (as in DN*) is not needed for that, but it also does not 
hurt. This appears to be of some interest for the analysis of computations in biological neural 
systems, since a fairly large variety of different functional roles have already been proposed for 
synaptic depression: explaining psychological data on conditioning and reinforcement (Gross- 
berg), boundary formation in vision and visual persistence, switching between different neural 
codes, and automatic gain control. As a complementation of these conjectured roles for synap¬ 
tic depression, we also proved a theorem which points to a possible functional role for synaptic 
facilitation: it empowers even very shallow feedforward neural systems with the capability to 
approximate basically any linear or nonlinear filter that appears to be of interest in a biological 
context. Furthermore we show that this possible functional role for facilitation can co-exist with 
independent other functional roles for synaptic depression: Our result shows that one can first 
choose the parameters that control synaptic depression to serve some other purpose, and can 
then still choose the parameters that control synaptic facilitation so that the resulting neural 
system can approximate any given time invariant filter with fading memory. 

Theorem. Assume that U is the class of functions from R into [B 0 , £?i]) which satisfy \u(t) — 
,w( s )l < i?2 • — s| for all t, s G 2. where Bo, B \. ffy are arbitrary real-valued constants with 

0 < Bo < B\ and 0 < B-z . Let J~ be an arbitrary filter that maps vectors u =< V\. ... ,u n ) e-u n 
into functions from R into'R . 

Then the following are equivalent: . , ; . 

(a) T can be approximated by dynamic networks S € DN (i.e., for any e > 0 there exists 
some <S € DN such that | Tu(t) — <Su(f)| < e for all uEU n and all t e R) 

(b) T can be approximated by dynamic networks <S € DN with just a single layer of sigmoidal 
neurons 

(c) B is time invariant and has fading memory 

(d) T can be approximated by a sequence of (finite or infinite) Volterra series. 

These equivalences remain valid if DN is replaced by DN*. 

The following result follows from the above Theorem. It shows that the class of filters 
that can be approximated by dynamic networks is very stable with regard to changes in the 
definition of a dynamic network. 

Corollary. Dynamic networks with just one layer of dynamic synapses and one subsequent 
layer of sigmoidal gates can approximate the same class of filters as dynamic networks with 
an arbitrary finite number of layers of dynamic synapses and sigmoidal gates. Even with a 
sequence of dynamic networks that have an unboundedly growing number of layers one cannot 
approximate more filters. 

Furthermore if one restricts the synaptic dynamics in the definition of dynamic networks to 

OO 

the simplest form wfit) = w* • (1 + pf xfit - r)e _T / 7 dr) with some arbitrarily fixed p > 0 and 

o 
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time constants 7 from some arbitrarily fixed interval [a, 6 ] with 0 < a < b, the resulting class 
of dynamic networks can still approximate (with just one layer of sigmoidal neurons) any filter 
that can be approximated by a sequence of arbitrary dynamic networks as defined. In the case 
of DN* one can either choose to fix p > 0 or one can arbitrarily fix the interval [a, b\ for the 
value of 7 . 

7 Optimal Control of Hybrid Systems 

The problem of optimal control for hybrid systems, mixing continuous and discrete variables, 
is recognized as one of the central challenges in the emerging hybrid system area, and work 
carried out under this grant resulted in substantial advances. Indeed, the papers [20], [21], 
and [22] provided different versions of the Maximum Principle of optimal control for hybrid 
systems, under minimal regularity conditions. In this short summary, we will define the class 
of hybrid problems to be considered and then state informally the Maximum Principle, leaving 
aside a detailed specification of technical assumptions. (The references given above should be 
consulted for all details.) The results in the papers [20], [21], and [22] are stronger than the 
usual versions of the finite-dimensional maximum principle. For example, even the theorem for 
classical differentials applies to situations where the maps are not of class C 1 , and can fail to 
be Lipschitz continuous. The “nonsmooth” result applies to maps that are neither Lipschitz 
continuous nor differentiable in the classical sense. From now on, the expression “smooth 
manifold”—or, simply, the word “manifold”—means “finite-dimensional Hausdorff manifold of 
class C 1 without boundary.” If M is a manifold, and x £ M, then T X M, T*M, TM, T*M 
denote, respectively, the tangent and cotangent spaces of M at x, and the tangent and cotangent 
bundles of M. We start with several definitions. 

A finite family of state spaces is a pair ( Q,M ) such that 

FFSS 1 . Q is a finite set; ' 

FFSS2. M — {M q } q( zQ is a family of smooth manifolds, indexed by Q. 

If ( Q,M ) is a finite family of state spaces, then for each pair (q, q') £ Qx Q we use M qA ' to 
denote the product M q x M q ' x r x r. 

A switching constraint for a finite family of state spaces ( Q,M ) is a family S = 
{Sq,q'}(q,q')eQxQ sucl1 that Sq,<f is a subset of Mq t q> for every pair (q, q') £ Q x Q. 

The following is the definition of “hybrid control system” that will be adopted for the 
purposes here. A hybrid control system is a 6 -tuple 

Z = (Q,M,U,f,U,S) 


such that 

HCS1. ( Q,M ) is a finite family of state spaces; 

HCS2. U = {Uq}qzQ is a family of sets; 

HCS3. / = {f q }qeQ is a family such that f q is, for each q, a partially defined map from M q xU q xr 
to TM q , having the property that f q (x,u,t) belongs to T x M q for every ( x,u,t ) £ M q x 
UqXr for which f q (x,u,t ) is defined; 
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HCS4. U = {U q } q( zQ is a family consisting, for each q, of a set U g , each of whose members is a 
map T) : I v —> U q defined on some subinterval I v of R; 

HCS5. S = {Sq,q'}( q , q r )eQ/Q i s a switching constraint for (Q.M.). 

The sets S q q > are the switching sets of E, and are allowed to be empty. One should think of S q q > 
as the set of all 4-tuples (x,x',t,t') such that x G M q , x’ € M q >, and a switching (or “jump”) 
from state x € M q to state x' G M q > is permitted at time t, with a resetting of the clock to 
time t'. Usually, one does not want to permit clock resetting, but for mathematical reasons it 
is better to allow it in principle, and exclude it, when desired, by just taking the switching sets 
S q tf to consist only of points of the form (x, x ', t, t). 

The members of Q are called locations. The families M, U, are, respectively, the family of 
state spaces and the family of control spaces of E. For each q, the manifold M q , the set U q , the 
map f q , and the set U q are, respectively, the state space, the control space, the dynamical law, 
and the class of admissible controls at location q. Usually, Q will be the set of states of some 
finite automaton. 

A control for a hybrid system E as above is a triple ( = (q, I, r/) such that 

• q = (</i,.. •, q v ) is a finite sequence of locations; 

• I = (Ji, is a finite sequence of compact intervals; 

• V = 07i, • • • is a finite sequence such that r/j belongs to U Qj and I Vj = Ij for j = 

. : !>•••:"• ' ■ ... ....... ... 

If C = (q, t v) is a control, and I = (Ji,... ,I U ) for j = 1,..., v, we use q(C), 1(C), f?(C), ^(C)> to 
denote, respectively, the finite sequences q, I, iq, and the natural number u. If Ij — [tj, Tj], we 
use t(C), r(C) to denote the sequences (tj,... ,t u ) and (n,..., r u ), and we let a^= t\, 6 C - r v . 
Then Oq, b^, — 1, and q(C) are, respectively,' the initial time, the terminal time, the number 

of switchings, and the switching strategy of C- 

If S = {Q,M,U, f,U,S) is a hybrid system as above, C is a control for S, and v = v{Q, 
then a pretrajectory for C is a i/-tuple £ = (£1,... ,£„) such that, if 

1(C) = (ii,...,/„), Ij = [tj,Tj\, q(C) = (qu---,qu), *7(0 = (Vu---,Vv), 

then, for each j G {1,..., u}, fj is an absolutely continuous map from Ij to the manifold M qj , 

having the property that / gi (£j(t),^(i),t) is defined and £j(t) = f qj (^j(t),r]j{t),t) for almost 
all t £ Ij . 

If E is a hybrid system as above, a pretrajectory-control pair for E is a pair (£, C) such that 
C is a control for E and £ is a pretrajectory of E for £• 

We use PTCP( E) to denote the set of all pretrajectory-control pairs of the system E. 

An endpoint constraint for a finite family of state spaces ( Q,M ) is a family £ = 
{Eq,q'}(q,q>)eQxQ of sets such that E qq / is, for each (q, q') G Qx Q, a subset of M q<q '. 

Notice that, mathematically, an endpoint constraint is exactly the same kind of object as 
a switching condition. This is why the part of the maximum principle that has to do with the 
switchings will have the same form as the transversality condition. 

Let E = (Q,M,U, f,U,S) be a hybrid control system as in the previous definitions, and 
let S = (£, C) belong to PTCP(E). Let i/ = t/(C), £ = (£i, ...,£„), q(C) = (qi, ...,q„), 

I’(C) — (^l> — > t(C) = (tx, ..., t„) , 1(C) = (Ii, — , Iff), S — {S q , q '}( q , q ')eQxQ- Then 
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( 5 ) 


• The endpoint condition of £ (or of 2) is the 4-tuple 


Qt = dE = (£„(t c Ui(«c)»&c>«c) e • 


• If 1 < j < v, the “j-th jump” of £ (or of 2) is the 4-tuple 

dj£ = dj^ = {£j{Tj),£j+i(tj+i),Tj,tj+i) € • ( 6 ) 

• If £ = {^ g , g '}( g ,g')eQxQ is an endpoint constraint for ( Q,M ), we say that S satisfies the 
constraint S if dE belongs to 

• We say that £ (or 2) satisfies the switching conditions for E if djE belongs to S qjt q j+1 
whenever j 6 {1,..., u — 1}. 

If E = (Q, MM- fM, «S) is a hybrid system as above, then 

• we say that a pretrajectory £ of E is a trajectory of E if £ satisfies the switching conditions 
for E; 

• we use TCP( E) to denote the set of all trajectory-control pairs of E (i.e., the set of all 
E = (£, C) € PTCP( E) such that £ is a trajectory of E), and TCP{ E;£) to denote the 
set of all 2 € TC'P(E) that satisfy the endpoint constraint £. 

If E is a hybrid system as above, then a Lagrangian for E is a family L = {L g } gG g such 
that 

• L q is, for each q e Q, a partially defined real-valued function on the product M q x U q x r, 

• whenever q € Q, r/ €t/ g has domain [or ,3]. and £ : [or, /5] —* M q is an absolutely con¬ 
tinuous solution of £(f) = f q (fi(t)-r](t),t) a.e., it follows that the function [a, (3} B t ^ 
L(€(t),T](t),t) is defined for almost every t, and is integrable. 

A switching cost function for E is a family 4> = {^q,q'}( qt q')eQxQ suc h that each 4> gg / is an 
extended real-valued function on 5 g>g / that never takes the value —oo. 

An endpoint cost function for E is a family tp = {^,q'}(q )9 ')eQxQ such that each g> qA > is an 
extended real-valued function on M q , q > that never takes the value —oo. 

If L = {L q } q€ Q is a Lagrangian for the hybrid control system E, then we can define the 
corresponding Lagrangian cost functional Cl : TCP(E) —* r , by letting 

Cx(4jC) = 5^ [ dt, (7) 

j=i Jl i 

where v = i /(£), I(£) = (Ji,...,/„), q(C) = (qi,---,Qu), = (V1 t-->Vv)> and £ = 

(£ii • • •) &)■ 

If $ is a switching cost function for E, and ip is an endpoint cost function, then we associate 
with and the functional : TCP( E) —* r U {+oo} that assigns to each 2 = (£, C) £ 
TCP( E) the number 

C'$ l¥ j(£, C) = > (®) 

3=1 
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where u = u(Q, and (qi ,..., q v ) is the switching strategy of (. 

A hybrid Bolza cost functional for E is an extended real-valued functional C : TCP{ E) —> 
r U {+ 00 } such that C = Cl + C$ tV> for some L, $,99 that are, respectively, a Lagrangian, a 
switching cost function, and an endpoint cost function for E. 

Given a hybrid control system E, a Bolza cost functional C for E, and an endpoint constraint 
£, we will consider the optimal control problem P(E, €, £), whose objective is to minimize C(£, () 
in the class TCP(E;£). We observe that the endpoint constraint sets E q ^ q > could all be of the 
special form E qq , x {6} x {a}, where a, b are fixed real numbers, independent of q, q', and each 
£q. q ’ is a subset of M q x M q >. In that special case, all the members E = (£. Q of TCP{ E;£) 
satisfy a q = a, b^ = b, so we have a problem with fixed initial and terminal times. In addition, 
the switching sets S q ^ could be of the form S q ^ = S° q , x {t q , q >} x {t q ^}, where S° qq , C M q x M q > 
and the t q j are fixed real numbers, in which case we would be dealing with a problem with 
fixed switching times and no clock resetting. 

7.1 The general form of the maximum principle 

Let us assume that 

Al. E = (Q,A4,U,f,U,S) is a hybrid control system; 

A2. C = Cl + C$ t<fi is a hybrid Bolza cost functional for E; 

A3. £ is an endpoint constraint for (Q,A4); 

A4. E# (the “reference trajectory-control pair” ) belongs to TCP(£;£), and 

s*«*,<*), ' ' ; t# -- Kf, 

< # = (q # ,I *,»*),■ ; ■ q# = ( 9 *.■ 

I# : = '. V* = (qf..•.,*). 

The maximum principle gives a necessary condition for E# to be a solution of V(E,C,£). The 
result only depends on comparing trajectories with the same switching strategy, and does not 
require the candidate arc E# to be a true solution. Moreover, even within the class of arcs 
corresponding to a fixed switching strategy, only arcs that are close to S# are compared with 
E#. So we introduce the following definition. 

A local solution of a problem V(E,C,£) is a trajectory-control pair S # = (* # ,C # ) = 
(£f > • • • > ) such that there exist neighborhoods fif\ ,.. -, Af v # of the graphs of ,..., 

in M qi x r,..., M ? # x r having the property that S# minimizes the cost C(E) in the class of 

all the trajectory-control pairs 3 = (£,£) = (£i> ••• >C) € TCP( E, £) such that q(C) = q(C # ) 
(so that, in particular, v = v*) and the graph G(fj) of ^ is contained in A/} for j = 1 ,..., v* . 
(Here the “graph” of is the set 

G(€j) d ={(t,j ( t), t) :te Domain (^j)} , ( 9 ) 

so G(£j) C M qj x r.) 

We now present the maximum principle for hybrid systems as a true “principle,” that is, a 
not very precise mathematical statement that can be rendered precise in various ways, giving 
rise to different “versions” of the principle. Two such versions—both completely precise and 
rigorous—are stated in the papers [ 20 ], [ 21 ], and [ 22 ]. 
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The maximum principle. Assume that Al-Af hold, andE# i s a local solution of V(T,,C,£). 
Then there exists an adjoint pair (if), ipo) along E# that satisfies the weak Hamiltonian maxi¬ 
mization, nontriviality, and transversality conditions for V(E,C,£) along E#. 

To turn the above statement into a theorem, we have to specify technical assumptions on 
the 12-tuple of data (Q, M,U, f,U, S, L, $, <p, £,£*, C # ), and assign a precise meaning to the 
notions of “adjoint pair,” “weak Hamiltonian maximization,” “nontriviality,” and “transversal¬ 
ity.” This is done in the above papers. 
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